[1]:
%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
    <div>Enter Password:</div>
    <input id="password" type="password"/>
    <button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
""")
# <a href="#" onclick="code_toggle(this); return false;">show code</a>
[1]:
[2]:
%run loadfuncs.py
from ipywidgets import *
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
toggle()
[2]:

Seasonal ARIMA (SARIMA) Models

The ARIMA model does not support seasonality. If the time series data has defined seasonality, then we need to perform seasonal differencing and SARIMA models.

Seasonal differencing is similar to regular differencing, but, instead of subtracting consecutive terms, we subtract the value from previous season.

The model is represented as SARIMA\((p,d,q)x(P,D,Q)\), where - \(D\) is the order of seasonal differencing - \(P\) is the order of seasonal autoregression (SAR) - \(Q\) is the order of seasonal moving average (SMA) - \(x\) is the frequency of the time series.

[3]:
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)

# Usual Differencing
axes[0].plot(df_drink.sales, label='Original Series')
axes[0].plot(df_drink.sales.diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)


# Seasinal Differencing
axes[1].plot(df_drink.sales, label='Original Series')
axes[1].plot(df_drink.sales.diff(4), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drink Sales', fontsize=16)
plt.show()

toggle()
../../_images/docs_forecasting_sarima_4_0.png
[3]:

As we can clearly see, the seasonal spikes is intact after applying usual differencing (lag 1). Whereas, it is rectified after seasonal differencing.

Build SARIMA Model

Find optimal SARIMA for house sales:

[4]:
sarima_house = pm.auto_arima(df_house.sales,
                       start_p=1, start_q=1,
                       test='adf',
                       max_p=3, max_q=3, m=4,
                       start_P=0, seasonal=True,
                       d=None, D=1, trace=True,
                       error_action='ignore',
                       suppress_warnings=True,
                       stepwise=True)

sarima_house.summary()
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=3069.410, BIC=3087.760, Fit time=0.437 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 0, 4); AIC=3317.294, BIC=3324.633, Fit time=0.018 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 0, 4); AIC=3167.461, BIC=3182.140, Fit time=0.337 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=3242.302, BIC=3256.981, Fit time=0.293 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3068.582, BIC=3090.601, Fit time=0.769 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 0, 4); AIC=3144.942, BIC=3163.292, Fit time=0.491 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 2, 4); AIC=3068.586, BIC=3094.275, Fit time=1.255 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 0, 4); AIC=3236.794, BIC=3251.474, Fit time=0.160 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(2, 1, 2, 4); AIC=3071.836, BIC=3101.195, Fit time=1.140 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3235.957, BIC=3254.306, Fit time=0.466 seconds
Fit ARIMA: order=(2, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3070.496, BIC=3096.185, Fit time=0.637 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 1, 4); AIC=3090.901, BIC=3109.250, Fit time=0.454 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(1, 1, 1, 4); AIC=3070.548, BIC=3096.237, Fit time=0.978 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(1, 1, 1, 4); AIC=3313.310, BIC=3327.989, Fit time=0.350 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 1, 1, 4); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(2, 1, 1, 4); AIC=3069.171, BIC=3094.860, Fit time=0.710 seconds
Total fit time: 8.514 seconds
[4]:
Statespace Model Results
Dep. Variable: y No. Observations: 294
Model: SARIMAX(1, 0, 1)x(1, 1, 1, 4) Log Likelihood -1528.291
Date: Mon, 29 Apr 2019 AIC 3068.582
Time: 16:45:28 BIC 3090.601
Sample: 0 HQIC 3077.404
- 294
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 0.0002 0.129 0.002 0.998 -0.254 0.254
ar.L1 0.9997 0.143 6.994 0.000 0.720 1.280
ma.L1 -0.3049 0.053 -5.765 0.000 -0.409 -0.201
ar.S.L4 -0.1031 0.056 -1.848 0.065 -0.212 0.006
ma.S.L4 -0.9982 0.647 -1.542 0.123 -2.267 0.271
sigma2 2109.4165 1043.105 2.022 0.043 64.968 4153.865
Ljung-Box (Q): 68.41 Jarque-Bera (JB): 5.09
Prob(Q): 0.00 Prob(JB): 0.08
Heteroskedasticity (H): 0.55 Skew: -0.14
Prob(H) (two-sided): 0.00 Kurtosis: 3.59


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Find optimal SARIMA for drink sales

[5]:
sarima_drink = pm.auto_arima(df_drink.sales,
                       start_p=1, start_q=1,
                       test='adf',
                       max_p=3, max_q=3, m=4,
                       start_P=0, seasonal=True,
                       d=None, D=1, trace=True,
                       error_action='ignore',
                       suppress_warnings=True,
                       stepwise=True)

sarima_drink.summary()
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=809.758, BIC=820.230, Fit time=0.160 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 0, 4); AIC=848.072, BIC=852.261, Fit time=0.013 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 0, 4); AIC=813.490, BIC=821.867, Fit time=0.063 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=813.808, BIC=822.185, Fit time=0.142 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=811.146, BIC=823.712, Fit time=0.378 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 0, 4); AIC=810.383, BIC=818.760, Fit time=0.138 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 2, 4); AIC=811.498, BIC=824.064, Fit time=0.218 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 2, 4); AIC=813.467, BIC=828.127, Fit time=0.312 seconds
Fit ARIMA: order=(2, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=811.201, BIC=823.767, Fit time=0.299 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(0, 1, 1, 4); AIC=813.022, BIC=821.399, Fit time=0.130 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(0, 1, 1, 4); AIC=810.926, BIC=823.492, Fit time=0.345 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 1, 4); AIC=849.704, BIC=855.987, Fit time=0.071 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(0, 1, 1, 4); AIC=813.065, BIC=827.725, Fit time=0.292 seconds
Total fit time: 2.571 seconds
[5]:
Statespace Model Results
Dep. Variable: y No. Observations: 64
Model: SARIMAX(1, 0, 1)x(0, 1, 1, 4) Log Likelihood -399.879
Date: Mon, 29 Apr 2019 AIC 809.758
Time: 16:45:30 BIC 820.230
Sample: 0 HQIC 813.854
- 64
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 121.9284 50.994 2.391 0.017 21.981 221.876
ar.L1 0.4436 0.178 2.492 0.013 0.095 0.793
ma.L1 0.5287 0.143 3.687 0.000 0.248 0.810
ma.S.L4 -0.2772 0.141 -1.969 0.049 -0.553 -0.001
sigma2 3.487e+04 6857.056 5.085 0.000 2.14e+04 4.83e+04
Ljung-Box (Q): 28.74 Jarque-Bera (JB): 0.83
Prob(Q): 0.91 Prob(JB): 0.66
Heteroskedasticity (H): 1.28 Skew: -0.29
Prob(H) (two-sided): 0.58 Kurtosis: 3.03


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

SARIMA Forecast

[6]:
sarima_forcast(sarima_house, df_house, 'sales', forecast_periods=24, freq='month')
../../_images/docs_forecasting_sarima_12_0.png
[7]:
sarima_forcast(sarima_drink, df_drink, 'sales', forecast_periods=24, freq='quarter')
../../_images/docs_forecasting_sarima_13_0.png

SARIMAX Model with Exogenous Variable

We have a SARIMA model if there is an external predictor, also called, “exogenous variable” built into SARIMA models. The only requirement to use an exogenous variable is that we need to know the value of the variable during the forecast period as well.

For the sake of demonstration, we use the seasonal index from the classical seasonal decomposition on the latest 3 years of data even though SARIMA already modeling the seasonality.

The seasonal index is a good exogenous variable for demonstration purpose because it repeats every frequency cycle, 4 quarters in this case. So, we always know what values the seasonal index will hold for the future forecasts.

[8]:
df_drink = add_seasonal_index(df_drink, 'sales', freq='quarter', model='multiplicative')
sarimax_drink = pm.auto_arima(df_drink[['sales']], exogenous=df_drink[['seasonal_index']],
                           start_p=1, start_q=1,
                           test='adf',
                           max_p=3, max_q=3, m=12,
                           start_P=0, seasonal=True,
                           d=None, D=1, trace=True,
                           error_action='ignore',
                           suppress_warnings=True,
                           stepwise=True)

sarimax_drink.summary()
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=727.512, BIC=739.103, Fit time=0.493 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=730.208, BIC=736.003, Fit time=0.014 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=726.746, BIC=736.405, Fit time=0.269 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=728.526, BIC=738.185, Fit time=0.261 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=732.167, BIC=739.895, Fit time=0.029 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(2, 1, 0, 12); AIC=728.728, BIC=740.319, Fit time=0.960 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=724.752, BIC=732.479, Fit time=0.298 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 0, 12); AIC=726.648, BIC=736.307, Fit time=0.348 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 0, 12); AIC=728.330, BIC=739.921, Fit time=0.390 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(2, 1, 0, 12); AIC=726.757, BIC=736.416, Fit time=0.563 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Total fit time: 3.655 seconds
[8]:
Statespace Model Results
Dep. Variable: y No. Observations: 64
Model: SARIMAX(0, 1, 0)x(1, 1, 0, 12) Log Likelihood -358.376
Date: Mon, 29 Apr 2019 AIC 724.752
Time: 16:45:34 BIC 732.479
Sample: 0 HQIC 727.705
- 64
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 5.4344 44.502 0.122 0.903 -81.787 92.656
x1 -0.0406 3.01e+04 -1.35e-06 1.000 -5.9e+04 5.9e+04
ar.S.L12 -0.4401 0.151 -2.922 0.003 -0.735 -0.145
sigma2 7.164e+04 1.95e+04 3.671 0.000 3.34e+04 1.1e+05
Ljung-Box (Q): 52.27 Jarque-Bera (JB): 2.60
Prob(Q): 0.09 Prob(JB): 0.27
Heteroskedasticity (H): 1.70 Skew: 0.54
Prob(H) (two-sided): 0.28 Kurtosis: 2.74


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[9]:
sarimax_forcast(sarimax_drink, df_drink, 'sales', forecast_periods=24, freq='quarter')
../../_images/docs_forecasting_sarima_17_0.png

The coefficient of x1 is small and p-value is large, so the contribution from seasonal index is negligible.